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We study the notion of approximate entropy within the framework of network theory. Approx- 
imate entropy is an uncertainty measure originally proposed in the context of dynamical systems 
and time series. We firstly define a purely structural entropy obtained by computing the approx- 
imate entropy of the so called slide sequence. This is a surrogate of the degree sequence and it is 
suggested by the frequency partition of a graph. We examine this quantity for standard scale-free 
and Erdos-Renyi networks. By using classical results of Pincus, we show that our entropy measure 
converges with network size to a certain binary Shannon entropy. On a second step, with specific 
attention to networks generated by dynamical processes, we investigate approximate entropy of hor- 
izontal visibility graphs. Visibility graphs permit to naturally associate to a network the notion 
of temporal correlations, therefore providing the measure a dynamical garment. We show that ap- 
proximate entropy distinguishes visibility graphs generated by processes with different complexity. 
The result probes to a greater extent these networks for the study of dynamical systems. Applica- 
tions to certain biological data arising in cancer genomics are finally considered in the light of both 
approaches. 
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I. INTRODUCTION 

Concepts such as information, entropies, and measures 
of complexity are highly connected subjects within the 
core of dynamical systems and chaos theory. The area 
benefits from a wealth of literature that dates back to 
the works of Kohnogorov on metric entropies, and con- 
tinues with the developments of Sinai, Eckmann, Ruelle, 
and others (see and references therein for a review 
on the topic). Roughly speaking, this branch of science 
is relatively mature for answering questions such as how 
system, which is sensitive to initial conditions (with pos- 
itive characteristic Lyapunov exponents), generates un- 
certainty as time evolves, and how this entropy produc- 
tion is related to the structure (invariant measure) of the 
system. 

In recent years, in parallel with the advent of the study 
of complex networks (see, for example, [24| IH|), simi- 
lar ideas aiming to describe the amount of organization 
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of these systems are starting to take root. As a mat- 
ter of fact, to describe mathematically the amount of 
heterogeneity and complexity found in natural and tech- 
nological networks is nowadays a major endeavor in the 
frameworks of network theory, general data analysis, and 
inference. Several recent works point towards an entropic 
origin for a variety of key properties of complex networks 
that we find around us, such as the biodiversity mainte- 
nance in ecological networks [1, 2], or, more generally, the 
emergence of robust degree-degree correlations [17| and 
communities in social and biological networks Indeed, 
the amount of heterogeneity in a network is a basic in- 
gredient for quantifying properties of diffusion processes, 
like the spread of human epidemics, computer viruses, 

etc. snEm. 

Some theoretical approaches to deal with the notion of 
network heterogeneity include the references 0, Q , where 
a statistical mechanics perspective is adopted to estimate 
the (thermodynamic) entropy of network ensembles given 
by a set of constraints. Other lines of research make 
use of spectral theory to derive optimal network config- 
urations However, the majority of proposed 
network-based entropic functionals are, so far, entropies 
a la Shannon (see also [llj for a recent review). Fewer 
work has been reported on the extension of other invari- 
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ant measures to the network theoretic context. 

In order to contribute filling this gap, in this paper 
we consider the notion of approximate entropy as in- 
troduced by Pincus [29(. This is a finite size statistic 
of the Eckmann-Ruelle entropy originally proposed as a 
measure to determine the complexity of a system that 
changes in time. Time series are in fact the main area of 
application. We explore network-based extensions of the 
approximate entropy. We ask questions about the useful- 
ness of this parameter to estimate degrees of uncertainty 
both in the case of static and growing networks. It is 
important to see that any attempt of defining an approx- 
imate entropy of network parameters (following the same 
line of Pincus) must be based on some ordering of the 
data. As a starting point, it is natural to consider cer- 
tain orderings associated with the degree sequence and 
its related quantities. 

The present work contains several contributions: (1) 
the general point is the novelty of studying the notion of 
approximate entropy in a network theoretic context; (2) 
this requires the translation of network parameters into 
a time series that is amenable to be investigated with 
analytic tools, accordingly, we define a binary string 
associated to the degree sequence as it is suggested by 
the notion of frequency partition; (3) as expected, given 
that the string reflects some coarse grained properties 
of the degree sequence, we are able to show that the 
approximate entropy of the string distinguishes between 
common network ensembles; (4) we then consider 
visibility graphs, since these objects are associated to 
dynamical systems and therefore present a natural time 
ordering. We show that approximate entropy of those 
visibility graphs allows to distinguish between series 
generated by different types of processes. 

The remainder of the paper is organized as follows. In 
Section II we recall the definition and main properties 
of approximate entropy. In Section III we study how to 
extend such notion to the context of networks, defining 
a network-based approximate entropy. Given this mea- 
sure, we study both the case of static networks (Section 
IV) , where the measure is purely structural, and the case 
of growing networks (Section V), where the measure ac- 
quires a more dynamical meaning. This latter situation 
is studied within the context of visibility graphs [III [22| • 
The measure is finally tested in Section VI with real data. 
By considering networks constructed from data obtained 
in the context of cancer statistics, we probe the capabil- 
ity of the measure to distinguish amongst different cancer 
phenotypes. A discussion is finally presented in Section 
VII. The main open question concerns the study of ap- 
proximate entropies of parameters beyond the degrees. 
One may take a number of different approaches depend- 
ing on the parameters considered. Sequences of combina- 
torial nature obtained from counting paths and sequences 
of algebraic nature, like for example, graph spectra, seem 
to be the good candidates for this purpose. Concern- 
ing the latter idea, it is an open direction to determine 



whether approximate entropy has any role in character- 
izing matrix ensembles when applied to their spectra. 

II. APPROXIMATE ENTROPY 

We begin by recalling the notion of approximate en- 
tropy. Due to Pincus [29], its definition is based on 
ideas of Eckmann, Ruelle, and ultimately Kolmogorov- 
Sinai. Whereas originally defined for time series, when 
the series is drawn from an alphabet of finitely many 
symbols, it has a powerful combinatorial interpretation 
due to Rukhin [361 ]. However, approximate entropy 
has in general a geometric interpretation given when 
comparing "densities" of the Takens embedding of the 
time series in dimensions m and to + 1. Indeed, let 
to G N, r G K>o, u = { u (t))tLi a time series of N 
points and consider the mth Takens embedding delay 
map x(t) — (u(t), u(t + 1), u(t + to — 1)), with image 
X m = {x(l), x (n)} C R m , where n = N - (to - 1). 
Recall that if u arises from a dynamical system with a 
strange attractor of box dimension d then, when m > 2d, 
the image X m "reconstructs" the strange attractor in an 
appropriate sense. We write Xi(t) for the ith component 
of x G W n and let ||-|loo be the usual L °° norm 011 M '"' 
i - e - IMloo = max * \ x i\- B y denoting 

*" w =-pfaiH ' b£ " : iV^ }i ; 

the approximate entropy of the N data points u is defined 
as 

ApEn (to, r, N) = $ m+ i(r) - $ m (r), 

with the convention that ApEn(0,r, N) — $i(r) and 
Xa = {}. The upshot of this is that small values of ApEn 
imply strong regularity (or persistence) , whilst large val- 
ues amount to considerable irregularity in the time series 
u. From a concrete point of view, approximate entropy 
is interpreted as a measure of randomness of a finite se- 
quence. To complete the geometric picture, observe that 
the Eckmann-Ruelle entropy is indeed recovered in the 
small r, large m limit: 

lim lim lim ApEn (to, r, N) . 

i — >Q m—>oo N—>oo 

Amongst the practical uses of approximate entropy, for 
example when studying time-series data of financial 
markets [HJ or heart EEG data [HI , [37| . the literature 
often uses to = 1 or 2, together with r proportional to 
the standard deviation. 

The combinatorial picture gives the key insight for 
estimating the limiting distribution of approximate en- 
tropy and also identifying sequences of extremal approx- 
imate entropy. Indeed, let u G {0, 1, S — 1} N be a 
sequence of length N on S symbols (here we are tak- 
ing < r < 1). Let Vi lt ,„ t i m be the frequency with 
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which the block (ix,...,i m ) C {0, 1, S — l} m occurs 
in u = (ki, Ujvj Ui, ii m _i). This amounts to the 
frequency of the block in u arranged around in a circle. 
As before, set 

$ m = - v{I)\ogv{I). 

/6{0 : l,...,S-l} m 

The modified approximate entropy is 

ApEn (to) := $ m+ i - $ m 

so that its computation amounts purely to counting the 
relative frequencies associated with every possible block 
of length m occurring in the sequence. This allowed 
Rukhin to get analytic proofs of the distribution of ApEn: 
for fixed embedding dimension m, we have then 

2N (log S - ApEn (to)) -> X 2 (S m+1 - S m ) , 

so that the same behaviour follows for ApEn (to) because 

N (ApEn (m) - ApEn (mj) = Or ( —) . 



III. APPROXIMATE ENTROPIES OF A 
NETWORK 



sequences, i.e., amongst monotone decreasing sequences 
of the appropriate length, the proportion of these that 
are degree sequences tends to zero, a fact conjectured by 
Wilf and proven by Pittel HI. 

From now on, let d\ > ... > djy be the degree sequence 
of a finite network with distinct values D\ > ... > D s 
such that Di occurs rii times (so N = n\ + ... + n s ). 
How best to assign an approximate entropy to a degree 
sequence? We consider three options in turn: 

1. The simplest thing to do is just to compute the 
approximate entropy of this monotone sequence 
viewed as a time series, due to a lack of a natu- 
ral ordering of nodes. 

2. More sophisticated is to assign some combinatorial 
description of the degree sequence, with potentially 
interesting entropic properties. 

3. For networks with a natural ordering to their ver- 
tices, we can compute approximate entropy of such 
an ordering. For example, a network that has 
grown by a process of sequential node addition has 
a natural ordering on the nodes according to when 
they were added. An example of this is the stan- 
dard Barabasi- Albert model of preferential attach- 
ment [3|. 



The degree sequence of a (finite, unweighted, and 
undirected) network G with nodes labelled {1, ...,N} is 
d = (dx, d]sr) where di > ... > dpj and — deg(i) 
[25l | . Note at this point that not all decreasing se- 
quences of integers are degree sequences of networks. 
The best known criterion was given by Erdos and 
Gallai [39]: d\ > ... > djv is the degree sequence of 
a network with N nodes if and only if ^ di is even 

and Ei=i <U < k(k-l) + J21 

=fc+ i min (d;, fc) holds 
for every 1 < k < N. What this really says is that 
the trivial bounds for the partial sums dx + ... + dh of 
the first k largest degrees, obtained by combining the 
contributions of k(k — 1) for the links between those first 
k and min (dk+i, k) + ... + min (djv, k) for the rest, is in 
fact optimal. In other words, whenever the condition is 
satisfied for every k, there exists a network with such a 
degree sequence. A constructive proof of this fact was 
given by Tripathi, Venugopalan and West in [4(| • In 
fact, Tripath and Vijay [39] observed that it is enough 
to check the condition for k = n and every k such that 
dk > dk+i- 

Good asymptotic bounds for the number of degree 
sequences of networks on N nodes were given by 
Burns (l0| ; in this reference it was shown that the 
existence of constants Co and Ci such that the number 
of degree sequences of graphs on N vertices, A^r, 
satisfies -^-rf < An < 71 — 7== . It is worth noting 

C N — — (log N) C ^-VN b 

however that amongst the sensible candidates for degree 



A. The degree sequence as a monotone time series 

Option (1) does not seem to coincide with intuition 
of what an entropic sequence should be: indeed if to 
and to + 1 divide ./V then in the small r limit, the be- 
haviour captures more the dimension than the disorder 
in the sequence. This feature is highlighted by the fol- 
lowing observation: suppose that Di — Di + x > r, then 
ApEn(TO, r, N : to and m+ 1 divide N) is maximal when 
s = N/m and nix = ■■■ = n s = m and minimal when 
m = ... = n s — to + 1. Let us sketch how to show this 
statement. We may take < r < 1 and suppose the Di 
to be integers. Optimizing ApEn(m, r, N) — ^m+i — Q m 
for a degree sequence d amounts to optimizing the ratio 
R of geometric means of the numbers of points in X m of 
11-11 00 dist ance < r for each x € X m . Hence, 

1/\X 1 1 ' 

(jW m+1 \{y e x m+1 : Ik-ylU < r}|) 

In this case, < r < 1 implies that we need only optimize 

R _ qi: =1 A>M A - (m) ) 1/|x '" 1 
(n^iA^TO+^A.^+D) 1 / 1 ^ 11 ' 

i.e., 
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H Ai(m+ l)A 4 (m+l)/|X ro+ i|- 
z— 1 v 7 

Here Aj(m) := max (1, Ui — (m — 1)) such that rii + ... + 
n s = A and 1 < s < N. We are also free to sort the 
n% such that m > ... > n s . Now, suppose that there are 
j < s separate rn such that Aj(m) = n, — (m — 1) > 1 
and k < j distinct n, such that Ai(m+ 1) = m — m > 1. 
Set r"j = rii — m then ri > ... > > 3 > 2 = = ... = 
^fc > 1 > ?"/c+i > ...?V The ratio i? now reads 

j r 4 /|X m | fe 

11 ( r U (ri-l)/\X m+1 \ 11 
i=l l'i J-J i=i+l 

The statement follows by writing out the behaviour of 
x x/k j f^ x _ ^i x - 1 )/( k + 1 ) £ or an jntgggj- a ^ least 2. 

Thus this gives only an idea of proximity to such nonin- 
tuitive entropic sequences. For example, with < r < 1 
and m = 0, the approximate entropy is maximal for 
the sequence (JV - 1, JV - 2, [JV/2] , [A/2] , 2, 1); if 
m = 1 and A = 9, the maximum is realized by the se- 
quence (8, 8, 7, 7, 6, 6, 4, 4, 4). A graph is regular if all its 
vertices have the same degree. The frequency partition of 
a graph is a partition of its vertices grouped by their de- 
grees. Such a notion is a graph invariant, but intuitively 
there are many non-isomorphic graph with the same fre- 
quency partition. It is known that every partition is a 
frequency partition of some graph, with the exception of 
(1,1,..., 1) (see [35]). A graph has a regular frequency 
partition if each block of the partition is of the same size. 
In general, if m = 1, the graphs that realize the max- 
imum have degree sequence (A — 1, JV — 1, JV — 2, JV — 
2, ...,N — (N + l)/2,N—(N + l)/2, (JV— 1)/2) when JV is 
odd and (JV - 1, A - 1, A - 2, A - 2, N/2, N/2) when 
N is even (we take N > 4). 

B. The slide of a degree sequence 

Let us now explore option (2). Given a degree se- 
quence d — (c?i, djsr), for each i = 1,...,N — 1 write 
down if di = di+i and otherwise write a string of 
di — di+x l's if di > dj + i. We denote this sequence 
by slide(d). For a network G with degree sequence d, 
write slide(G). Note that for a network G on N nodes, 
N-l< #slide(G) < 2(N - 2) with the minimum length 
attained by (for example) regular networks and the maxi- 
mum attained by stars (i.e. complete bipartite networks 
with the singlet as a class of the bipartition) . Thus a 
degree sequence with s distinct degrees is associated to 
a binary sequence of N — s zeros and d\ ~ d^ ones and 
the collection of associated sequences of networks with 
N nodes is a certain subset of binary sequences with up 
to N zeros and up to N — 1 ones. For example, the 
degree sequence (4,3,3,3,1) is encoded as (1,0,0,1,1). 
The associated binary code has a simple interpretation in 



terms of "sliding" down the degree sequence: means "go 
horizontally right" and 1 means "continue going down". 
Not that not every binary sequence arises as the slide of 
some network. For example, there is no network G with 
slide(G) = 001: indeed such a degree sequence must be 
d% = e?2 = d$ — di + 1 and none of the possibilities 
(3, 3, 3, 2), (2, 2, 2, 1) or (1, 1, 1, 0) are graphical because 
they all have odd sum. Obviously the slide map is not 
injective and further, networks of different numbers of 
nodes can have the same slides. The approximate en- 
tropy of binary sequences was studied by Pincus |33f in 
the context of developing properties of normal numbers. 
The notion of approximate entropy allows us to com- 
pare binary sequences. The language used by Pincus 
to do so is as follows: a binary sequence u £ {0, 1}^ 
of length N is called (m, N)-random if ApEn (to, N) (it) 
is maximal amongst all binary sequences of the same 
length. Let m*(N) be the largest integer such that 

2 2 ( ' < N, and call a binary sequence u N -random 
if for m = 0, 1, 2, m*(N) it is (to, iV)-random. Finally, 
we can compare two binary sequences u and v of the 
same length N and say that u is more N-random than 
v if ApEn (to, N) (u) > ApEn (m, N) (v) holds for all m 
such that 1 < to < to* (JV). 

A characterization of binary sequences of the largest 
approximate entropy was also given by Pincus: for N > 
5, the JV-random binary sequences amount to equivalence 
classes of sequences of length A of a partially exchange- 
able process in which "approximate stability of frequen- 
cies holds", in the sense that 

# {(ao, a m ) -blocks in the sequence} 1 
N - to 2 m+1 

is as small as possible for each block type (ao, a m ) G 
{0, l} m+ and for every < to < to* (A). That such se- 
quences are asymptotically of large approximate entropy 
can be seen immediately from Rukhin's characterization. 
What this says is that the binary sequences of maximal 
approximate entropy amount to optimal "truncations" of 
normal numbers written in base 2. 

If asymptotically all binary sequences arose as the 
slides of some degree sequence, whilst the proportion of 
binary sequences of length n that are n-random tends to 
0, we could expect networks whose slides were arbitrarily 
nearby (it is unknown for which n there exist networks 
with n-random slides). In particular, we conjecture the 
following result : 

Conjecture. The probability that a uniformly chosen 
slide of length n is graphical tends to 1 as n — > oo . 

A potentially fruitful approach to a proof include three 
steps: firstly, one may consider Pittel's approach for 
proving the Wilf conjecture. The approach refines the 
insight of Erdos-Richmond in associating an integral es- 
timate of the probability of surviving the Nash- Williams 
graphicality condition. Subsequently, one should com- 
bine the Kolmogorov 0-1 law with a sample family of 
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networks whose slides make an asymptotically non-zero 
contribution. An alternative is to try show that with high 
probability, one can construct a network (perhaps via the 
arguments of [5(|) exploiting the "slack" gifted by the con- 
siderable non-uniqueness of slides. We have computed 
the first few terms of the sequence S2, S3, S6- The re- 
spective values are 3/4, 3/8, 13/16, 20/32, 58/64. We are 
now ready to define the slide entropy of a network: 

Definition. If G is a network with N nodes then the 
slide entropy of G is 

SlideApEn(G) := (ApEn(slide(G), m, r)) m=1 ^ m . w 

for any < r < 1. 

This notion is the topic of the next section. 



Scale Free Poisson 




Figure IV. 1: Circularly arranged slides of random scale free 
and Poisson networks of N = 50 nodes illustrating typical 
structure. Note that whilst N = 50 is perhaps far too small 
to meaningfully refer to the distribution as scale free the figure 
is intended only to be illustrative of the general appearance 
of the slides. 



IV. APPROXIMATE ENTROPY OF SLIDE 
SEQUENCES: APPLICATION TO STATIC 
COMPLEX NETWORKS 

Scale-free networks are of interest due to their abun- 
dance in nature and technology 0, [U. Computa- 
tionally, estimates of their behaviour as complex net- 
works are often made by treating their degree sequences 
as random variables sampled from a scale free distribu- 
tion 7r (x) = (7 — 1)2; 1 (for real x > 0). A natural 
construction via growth through preferential attachment 
was popularized by Barabasi and Albert The prob- 
ability distributions this yields on networks of N nodes 
(for each N) is quite distinct from simply sampling a 
scale free distribution and trying to assemble a network 
from that, the so-called configuration model [24|. Scale 
free networks are often compared with an older and very 
well studied notion of random network, introduced by 
Erdos and Renyi, these are constructed by adding each 
edge with a fixed probability p. For N nodes, with 
A = Np, their degree distribution is asymptotically Pois- 
son, tt(x) — — (x > 0). Also of interest are random 
networks with exponential degree distributions, which 
naturally arise as the renormalization group fixed point of 
visibility graphs associated to random uncorrelated time 
series, with tt (x) = \e- Xx (x > 0) [Hill]. 

The approximate entropy of a generic slide with di—d^ 
ones and N — s zeros can be computed by considering the 
Markov chain with state space X — {0, 1} and transition 
matrix P given by Poo = P10 = p and Pu = Pqi = 
q = 1 — p (thus 7r(0) = p and 7r(l) = q). We recall the 
following result of Pincus. For a first order stationary 
Markov chain with discrete state space X C N, transition 
probabilities P xy — P(travel from x to y) and stationary 
distribution tt with r < min^^gx \% — y\ then almost 
surely for every m 

lim ApEn (m, r,N) = - V* tt (x) P xy log (P xy ) 

N—too * — ' 

x,yGX 



It follows that 

where as usual H (p) = — (p\ogp + (1 — p) log (1 — p)). 
However for a scale free network on N nodes with degree 
distribution sampled from a truncated scale free distribu- 
tion 7r(fc) ~ fc -7 for k G {1, N— 1}. In the large 7 limit 
the network is dominated by degree 1 nodes as we can ex- 
pect to have at least 7Vfc _7 /C(7) nodes of degree k which 
tends to zero for all k except k = 1 . This creates large re- 
gions of zeros in their slides. Assuming generic behaviour 
in the region of the slide not accounting for the degree 1 

nodes, the ApEn estimate of Rukhin suggests intuitively 
that the slide entropy should be roughly monotone de- 
creasing in 7 for each fixed N. This intuition can be seen 
numerically in Fig. IV. 1. Estimating d\ —djq can be done 
by using the following elementary result. Let X%, Xn 
be drawn from a probability distribution tt on R with 
cumulative density II. Then E m := E (m th largest Xj) 
is given by 

I ^) (Ar _ TO ^ m _ 1)! n(^r- 1 (i - nc*))"- dx. 

If we are drawing the degrees deg(w) of nodes v € 
{1, N} then E(di — dw) = E\ — En can be computed 
for the scale free degree-distribution (we have E^ = 1 
almost surely) given approximately (following Ghoshal- 
Barabasi [3j) by 

i^(Ar-l)V(7-i) r /7^£j 

This becomes a good approximation for large N, but per- 
forms poorly near 7 = 2 due to the pole at 0. To estimate 
the expected number of distinct degrees s, we associate 
to the continuous probability distribution tt on M. a dis- 
tribution tt on N. An elementary argument says that the 
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expected number of distinct values of a random sample 
of size N from the distribution W is 

S (JV) = ^(l-[l-^)f). 

In this way, for a sample of N random variables from a 
scale free distribution W(n) := n~ 7 /£(7i) and upon com- 
paring with the integral, we get the following estimate 

. (JWl (_I,_ Wi 2zi,^). 

This provides an analytic expression for the entropy of 
a generic slide of the same 0-1 distribution, but is a 
considerable overestimate for real scale free slides. The 
slight "kink" in Fig IV. 2. around 7 s» 2.4 arises from 
the change in generic behaviour. We similarly obtain 
generic estimates for exponential degree sequences by 
computing E 1 -E N = \ ^1 7 ~ log (/V — 1) + O (£) 

and s{N) w ^1 — (l - \e~ Xa >)] dx. For Poisson 
networks, we find numerically that the generic estimate 
provides a good approximation. It is interesting to ask 
which probability distributions w on N tend to give rise 
to networks of the largest and smallest slide entropies. 
Amongst such distributions on TV nodes, we expect that 
the uniform distribution on {0, 1, N—l} is of the great- 
est typical slide entropy. Note that the point distribution 
on any fixed k £ {0, 1, N — 1}, such that the networks 
desired exist, always gives rise to slide entropy net- 
works. 



V. APPROXIMATE ENTROPY OF GROWING 
NETWORKS: HORIZONTAL VISIBILITY 
GRAPHS 

Within nonlinear time series analysis, the so called vis- 
ibility algorithms [l^ - |23| are a family of methods that 
directly map a given time series of N data into a net- 
work of N vertices (a so called visibility graph), where 
the edge set is constructed according to specific geomet- 
ric criteria to be applied among the data set. In pre- 
vious works it has been shown that the associated visi- 
bility graph of a time series with a given information is 
conserved or inherited in the topology of the associated 
visibility graph, including nontrivial structures such as 
chaotic or fractal dynamics. To cite a few, within the 
so called visibility algorithm approach 0, [20] , series ex- 
tracted from a fractional Brownian motion with Hurst 
exponent H map into scale-free visibility graphs with de- 
gree distribution P(k) ~ /c~ 7 , where the linear relation 
7 = 3 — 2H quantitatively relates the structure of the 
dynamical process (H) with the topology of the associ- 
ated graph (7). Within an alternative approach coined as 
the horizontal visibility algorithm [l6l. [22l|. it was shown 
PH that horizontal visibility graphs distinguish between 
correlated stochastic, uncorrelated and chaotic processes, 
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Figure IV. 2: Slide entropies for scale free and Poisson 
(Erdos-Renyi) networks of N = 2000 nodes: generic values 
given are those of uniform random slide entropies of slides 
with the same distribution of 0s and Is. Error bars indicate 
two observed standard deviations. 



and in each of these cases the visibility graph has expo- 
nential degree distribution P(k) ~ e~ xk with the value 
of A characterizing the particular process. Recently, it 
has been also suggested that the Shannon entropy over 
the degree distribution of a horizontal visibility graph is 
a first order approximation to the Kolmogorov-Sinai en- 
tropy of the associated dynamical system (23| . 
As a first comment, note that the visibility graph asso- 
ciated to a given time series conserves, by construction, 
the temporal ordering of the data, i.e. temporal corre- 
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lations amongst the data. This is due to the fact that 
in the mapping algorithm, each datum Xi maps into a 
labeled vertex n*, that is to say, a natural ordering of the 
vertex set emerges, respecting the temporal correlations 
in the series. The implications are twofold: (i) within 
(horizontal) visibility graphs one has a natural ordering 
of the degree sequence, which allows us to unambiguously 
calculate the approximate entropy of such series, and (ii) 
since that ordering is related to the temporal correla- 
tions of the associated series, the approximate entropy 
of a visibility graph may provide a measure of the asso- 
ciated series complexity, that is, it becomes a measure 
directly related to the original one introduced by Pincus 
in the framework of dynamical systems. In other words, 
whereas our previously defined, network-based, slide en- 
tropy accounts for the heterogeneity of the network it- 
self (with no dynamical/temporal information whatso- 
ever), in the context of visibility graphs this network- 
based measure is indeed capturing some dynamical in- 
formation. Notice also that since each datum in a time 
series is associated to a labeled vertex in the graph, one 
can view a visibility graph as a dynamically growing net- 
work: as time evolves, the dynamical process generates a 
trajectory (time series) whose associated visibility graph 
grows. The approximate entropy of its degree sequence 
accounts for the information stored in the network grow- 
ing process. 

In order to test the aforementioned conjectures, we will 
address, within the so called horizontal visibility algo- 
rithm, three types of time series whose associated ap- 
proximate entropy, associated to the amount of informa- 
tion needed to unravel the underlying dynamics, is qual- 
itatively different: periodic series (i.e. regular dynam- 
ics with point-like attractor measure) , chaotic series (de- 
terministic dynamics with finite attractor measure) and 
white noise (stochastic dynamics with infinite attractor 

measure). We proceed as follows: let {x t }t=i n be a 

real- valued time series of N data. The horizontal vis- 
ibility algorithm assigns each datum of the series to a 
vertex in the horizontal visibility graph (HVg). Then, 
two vertices i and j in the graph are connected if one can 
draw a horizontal line in the time series joining Xi and 
Xj that does not intersect any intermediate data height. 
Hence, i and j are two connected nodes if the following 
geometrical criterion is fulfilled within the time series: 

Xi, Xj > x n , Vn | i < n < j (V.l) 

The generated HVg has a degree sequence of the kind 
{fci, &2, fc/v}, where ki is the degree of vertex i, that is 
to say, associated to datum x\ in the original series (as 
opposed the definition in section UTTl not that this degree 
sequence is not monotonically decreasing since it has a 
natural ordering already explained above). We finally 
calculate our network based ApEn. Results for periodic, 
chaotic and noisy series are shown, for specific values of 
the ApEn parameters, are summarized in table U and in 
Fig. IV. II On this respect, we can highlight the following 
comments: 



Series description 


ApEn(2,2,2 14 ) 


periodic series (T=2) 


0.001 


chaotic series (logistic map, /i = 4.) 


0.47 


J7[0, 1] uncorrelated 


0.62 



Table I: Values of ApEn for concrete parameters m = 2, r = 2 
and size N = 2 14 , of the HVg associated to three types of 
time series: (i) a periodic series of period 2 (deterministic 
dynamics with an underlying attractor of zero measure), (ii) 
a chaotic series extracted from the fully chaotic logistic map 
Xt+i = 4xt(l — x t ) (deterministic dynamics with an underly- 
ing attractor of finite measure), and (iii) a series of uncorre- 
lated random variables extracted from a uniform distribution 
U[0, 1] (stochastic dynamics, i.e. dynamics with an hypothet- 
ical infinite-dimensional attractor). The approximate entropy 
of the visibility graphs increase as a function of the associated 
series information. 



• HVg associated to periodic series: by construction, 
these networks have a very homogeneous structure 
HI, which can indeed by seen as a concatenation of 
a network motif, the structure of this network mo- 
tif being intimately related to the series periodicity. 
Accordingly, their ApEn is small, having a vanish- 
ing value for large embedding dimension to. If we 
make use of small values of m, the ApEn statistic 
can be used to distinguish several degrees of peri- 
odicity, associated to the the heterogeneity of the 
visibility network root motif. 

• Visibility graphs associated to random uncorrelated 
white noise: white noise is a maximally entropic 
signal according to any well defined information 
theoretic measure. In a previous work [23j], it was 
shown that the Shannon entropy over the degree 
distribution of an HVg is indeed maximized for un- 
correlated white noise. Here, unlike periodic series, 
results are close to convergence for to = 2. For a 
given series size N , noise yields the maximal ApEn, 
but this value increases with the series size N, as 
it should. 

• Visibility graphs associated to chaotic maps: The 
ApEn of the associated networks reach a non zero 
value, reminiscent of the underlying attractor of the 
dynamics. Convergence is reached for to = 2 and, 
unlike noise, convergence as a function of series size 
is also reached here, as it should. 

This preliminary analysis suggests that the network 
structure captures the inherent complexity of the asso- 
ciated time series. This supports the well-posedness of 
the visibility graphs tool for time series analysis. 
Based on this conclusion, we can also point out that the 
network theoretical variation of the approximate entropy 
statistic can effectively distinguish different network 
structures according to their associated 'complexity'. 
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Figure V.l: Top: Sample series extracted from (left) the fully chaotic logistic map x t +i = 4a; t (l — xt) and (right) uncorrelated 
random variables from U[0, 1]. Below, we show a sample of the degree sequences of their associated HVgs. Bottom: Values of 
ApEn(2, 2, iV) of each HVg, as a function of the series size N. Note that the time is in logarithmic scales. The values for the 
graph associated to the noisy process are larger than those associated to the chaotic process, in concordance with the entropy 
associated to the underlying dynamical process. 



Mixed statistic: So far, two alternative ways have been 
considered in order to define the sequence over which the 
approximate entropy is computed, namely (i) networks 
without a predefined time arrow: computation is per- 
formed over the slide of the (monotonically decreasing) 
degree sequence, and (ii) growing networks: computation 
is performed over the (time ordered) degree sequence. A 
mixed approach consists in computing ApEn over the 
slide of a time ordered degree sequence: we consider an 
example of this next. 



VI. APPROXIMATE ENTROPY AND 
BIOLOGICAL NETWORKS: AN APPLICATION 
TO CANCER GENOMICS 

Finally, we focus on a potential application of these 
ideas to the field of cancer genomics. A key feature of 
cancer genomes is the abnormal copy number of genes. 
Since healthy cells are diploid they have 2 copies of each 
gene, however, in cancer cells, genes are deleted or may 
be present in multiple copies. Genes also have a natural 



ordering since they can be located to specific positions 
on the genome. Thus, for each tumour we can measure 
a copy-number profile along the genome. For technical 
reasons this is represented as a continuous valued vari- 
able (segmented data) [J], with neighboring genes more 
likely to have the same value. This copy-number profile 
varies along the genome, and can therefore be mapped 
as a time series, where genomic position plays the role of 
time. Thus, an HVg can be constructed for this genomic 
series of copy number values. We hypothesized that this 
HVG construction could encapsulate important informa- 
tion concerning the distribution and shape of the copy- 
number profiles of each individual tumour, a hypothesis 
that we test a posteriori by correlating the resulting en- 
tropy scores to known cancer phenotypes. 

As a data set we considered the copy-number data of 
171 breast cancer patients [4|, for which three phenotypic 
categories were available: estrogen receptor status (ER), 
whether the patient's tumour metastasized or not (DM) 
and histological grade (3 levels represents levels of differ- 
entiation from normal healthy tissue). For each tumour 
we computed the slide and ordered entropies from the 
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HVg graphs and asked if these differed between pheno- 
types. 

For brevity, we shall say that two phenotypes are dis- 
tinguished by some associated quantity if the means of 
the quantity for each phenotype are statistically signifi- 
cantly different (say at the 5% level). Interestingly, we 
find increases in both entropies as the grade of the cancer 
increases and in the case of distal metastasis (DM). More- 
over, the slide entropy distinguishes grade 1 breast cancer 
from grades 2 and 3 (Welch t-test with p-value 0.026 be- 
tween grade 1 and grade 2 and p- value 0.006 between 1 
and 3), however the ordered approximate entropy of the 
degree sequence of the HVg distinguishes grades 1 and 2 
from 3 [p value < 0.001 between grade 2 and 3). Both 
slide and HVG-ordered entropies distinguish ER nega- 
tive (0) and ER positive (1) breast cancer whilst neither 
distinguished DM-0 from DM-1. These results show that 
the ApEn and HVg construction can indeed capture in- 
teresting clinico-pathological of cancer genomes. 

We conclude this section by mentioning that the ordi- 
nary approximate entropy of the copy number data itself 
(segmented and viewed as a time series) can also distin- 
guish grade 1 and 2 from grade 3 breast cancer but not 
grade 1 from grade 2. Our analysis is reported in Fig. 



VII. CONCLUSIONS 

The original definition of approximate entropy quan- 
tifies the structure of a system's underlying phase space 
by looking at time evolving trajectories over such phase 
space, hence it requires a time ordered sequence. If the 
system under study is a network, its structure can still 
be studied with a use of approximate entropy, but it ob- 
viously requires some modifications/assumptions. If the 
network under study is generated by a dynamical pro- 
cess, then a natural time ordering can be defined over 
the vertex set, and ApEn is naturally extended to this 
domain: such is the case of visibility graphs, Barabasi- 
Albert or in general any kind of dynamically growing net- 
works. Conversely, if the network under study is static, 
the natural extension of ApEn is not straightforward. In 
this case, we can still compute approximate entropy of 
various networks parameters, but this requires to make 
some choices. In this work we have computed the ap- 
proximate entropy of parameters related to the degree 
sequence. In the attempt of capturing only the relevant 
information, we have introduced the slide sequence of a 
network. We have shown that approximate entropy per- 
mits to distinguish between the usual ensembles of Pois- 
son and scale-free networks. 

Moving from static to dynamical networks, we have 
focused on horizontal visibility graphs. Our findings sug- 
gest that the approximate entropy of these objects is in- 
timately related to the amount of information that the 
underlying dynamical process is generating. Indeed, we 
have given further evidence that just the degree sequence 



of an HVg have the power of discriminating between dif- 
ferent dynamical processes. Finally, we have applied our 
statistic to specific HVg generated by genomic data. We 
have considered cancer networks because their amount 
of disorder measured by entropic quantities has already 
been utilized as a classifier [38]. The apparent capacity 
of this statistic to distinguish between several degrees of 
cancer should be certainly clarified in further work. It 
is a challenge to observe analogues of phase transitions 
in statistics related to this type of data. Let us briefly 
conclude with two open problems. The first problem is 
about giving an interpretation to the approximate en- 
tropy as defined here; the second one is about identifying 
naturally ordered sequences associated to networks. 

• What does the slide ApeN tell us about a network? 
If we take a series and measure the ApEn of its visi- 
bility graph, what information do we learn from the 
dynamical process that generated the series? There 
should be a neat correlation between the ApEn of 
the series and the one of the visibility graph. Such 
a correlation may be used to determine what kind 
of information about a series is not seen by the vis- 
ibility graphs approach. The gain is to uncover the 
limits of methods for time series analysis based on 
visibility graphs. 

• We have already mentioned that ApEn makes sense 
only when applied to an ordered list of numbers 
obtained from a network. The first intuitive choice 
was the degree sequence when arranged in the non- 
increasing order. Clearly, this is not only the most 
natural choice, but it is also the easiest one. There 
are many potential generalizations based on dif- 
ferent criteria. From a dynamical viewpoint, one 
could label the vertex set and then generate ran- 
dom walks over a given network. These are analo- 
gous to run trajectories over a dynamical system's 
phase space. ApEn is then computed over these 
trajectories. 

From a combinatorial viewpoint, the most straight- 
forward generalization consists of looking at the 
second, third neighbours, and so on. These are 
sometimes called the shells of a node. In this way, 
we may consider ApEn of the sequences generated 
by the number of (deterministic) walks of a given 
length. A variety of choices is then available: to 
look at the sequence of the number of walks of 
growing length, to count the number of walks start- 
ing from different nodes, etc.. Remarkably this al- 
lows us to associate various sequences to each node, 
which can be then compared, averaged, etc.. The 
gain is the possibility of introducing network pa- 
rameters for quantifying the disorder in the cycle 
structure of the graph. Graphs with a particularly 
disordered cycle structures, like for example the 
controllable graphs introduced in , are expected to 
have higher ApEn of their walks. 
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Figure VI. 1: Slide entropy (m = 1) and ordered degree sequence entropy (m = 1 and r < 1) of degree sequences of HVGs 
constructed from copy number data of 171 breast cancer patients split according to phenotype. ER denotes estrogen receptor 
status, indicating whether the cancer cells depend on estrogen for their growth, GRADE indicates the extent to which the cells 
have differentiated away from being normal cells and DM denotes whether or not the patient has suffered distal metastasis (i.e. 
whether the cancer has spread). The bars represent 5 and 95% quantiles of the distribution. 



If instead of combinatorial criteria, we aim at a 
more algebraic perspective, a first choice consists 
of taking the spectrum of a matrix that represents 
the network faithfully, like the adjacency matrix or 
a Laplacian. Indeed the spectrum of a network is a 
graph invariant and a naturally ordered sequence. 
A very superficial analysis based on Fig lVII.ll sug- 
gests that ApEn does not contain valuable infor- 
mation, or at least information that is not easy 
to interpret. Hence, it remains an open problem 
to determine what kind of network properties are 
identified by computing ApEn of spectra and if this 
quantity grasps something about different matrix 
ensembles. 
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